Background: With the advent of array-based techniques to measure methylation levels in primary tumor samples,\nsystematic investigations of methylomes have widely been performed on a large number of tumor entities. Most of\nthese approaches are not based on measuring individual cell methylation but rather the bulk tumor sample DNA,\nwhich contains a mixture of tumor cells, infiltrating immune cells and other stromal components. This raises questions\nabout the purity of a certain tumor sample, given the varying degrees of stromal infiltration in different entities. Previous\nmethods to infer tumor purity require or are based on the use of matching control samples which are rarely available.\nHere we present a novel, reference free method to quantify tumor purity, based on two Random Forest classifiers, which\nwere trained on ABSOLUTE as well as ESTIMATE purity values from TCGA tumor samples. We subsequently apply this\nmethod to a previously published, large dataset of brain tumors, proving that these models perform well in datasets that\nhave not been characterized with respect to tumor purity .\nResults: Using two gold standard methods to infer purity â?? the ABSOLUTE score based on whole genome sequencing\ndata and the ESTIMATE score based on gene expression data- we have optimized Random Forest classifiers to predict\ntumor purity in entities that were contained in the TCGA project. We validated these classifiers using an independent\ntest data set and cross-compared it to other methods which have been applied to the TCGA datasets (such as ESTIMATE\nand LUMP).\nUsing Illumina methylation array data of brain tumor entities (as published in Capper et al. (Nature 555:469-474,2018)) we\napplied this model to estimate tumor purity and find that subgroups of brain tumors display substantial differences in\ntumor purity.\nConclusions: Random forest- based tumor purity prediction is a well suited tool to extrapolate gold standard measures\nof purity to novel methylation array datasets. In contrast to other available methylation based tumor purity estimation\nmethods, our classifiers do not need a priori knowledge about the tumor entity or matching control tissue to predict\ntumor purity.
Loading....